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ABSTRACT 


The problem discussed is the exact computer 
solution of systems of linear equations whose coefficients 
are integers or polynomials over the integers. Two 
methods, the multi-step and congruential algorithms, are 
described and compared. The number of single-precision 
operations required to perform each is found, and it is 
concluded that the congruential method is superior except 
perhaps for small systems. 

In previous congruential algorithms for the poly- 
nomials, certain 'bad'" primes which occur are discarded. 
In this study it is shown that these primes need not be 
discarded. In addition, a theorem for effectively term- 
inating the Chinese Remainder formula for polynomials 


is given. 
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CHAPTER ONE 


Introduction 


The problem to be considered in this study is the 
exact computer solution of systems of linear equations 
with coefficients in the integers I , or in I(x) 9+ + 9X) ; 
the multivariate polynomials over the integers. 

Symbolic solutions are required in connection with 
a wide range of problems. The applications of flowgraphs 
serve aS an illustration. A flowgraph is a labelled, 
directed graph with a starting node and a final node. 
The generating function G of a flowgraph F having 
starting node s_ , final node f , and transmission matrix 
T can be found by solving the system of n equations 
Ax = b for x, , where A = (let a hoes arandeb ie 
the s'th unit vector. Problems arising in electric 
circuit theory, in the analysis of Markov chains, in 
graph theory, and in coding theory can all be solved by 
finding such a generating function. The graph in Figure l, 
for example, may be considered a flowgraph with starting 
node s=l , final node f=6 , and transmission (or 
connection) matrix BA Bau ; cae is the label on the 
path from node i to node j . Solution of the system 


of equations 


edt e2 ybuta aif? a2 berebianoo sd oF moidoyq odT 
sholtsupe tssnii to exeveys to moituion set¥qmoa toaxe 
; Chere ee aT ni zo , I exsge7at eft nt sinsioitisos dziw 
<t1egstai elt revo eletmonylog steiaavisium eft ; 
dtiw noltosanoo ai Ostitipst ets etoituloe oiloday2 
argqstywolt to avoizsoifqus eiT .eavidorg Yo syns x ablw o : 
gboileadel 4 ei dqetgwolt A .aokivecxtaulit ms 25 svaies 
-sbon loath « bne shon gritiste 6 atiw dqety betoeteb 
arived I iqergwolt 5s to 8 nolzpnu? gaitsiteneg sit 
xinvwam noiseimanest bee , i shorn feria, a 4sbon anktiste 
enoltsups o to meteye sit gnivies yd bruel sd nso T 
ai d hrs (eee a) 2 stenw , Pas "ot od @ KA 
cintosf{s mi gnteics ensido1l .«otoev sinw Ade ode 
ai ,2nteds voAtsM Yo zteylens sft nk ,vnoedtt tiwoyts -_ 
vd bevios od ifs nso yrost> gnifoo at baa .yromdty dqety - 
(I awwgll ni fqetg of? nolan? gnivevsaey s dovegaibais 
grituste driw rgerywolt s bersbizago od yest ,siqmaxs 402 
40) motesimansit bas , <3 sbon fant? , {<2 show 
ent no Iedel edr at te? ¢ 3%) 30"? wittem (nottooanoD 
Srrere Sit Le Keksucoe - t sbon oF £ ‘eben mort dgeq 
| anaes te 


ie) ee Stowe AO ee x 3 
-a 1 -b -a -a 0 Xo 0 
epevea its. 0 0m =p a 0 
Ciba 5 GC dh ; 0 
0 0 -a 0 l -a Xo 0 
Oo OP | CON — 0 
yields 
Beas ere ee Sone 


TS Aika cre ener (are ea Dee gy ARC Lae EE REC A 


-2ab-a SEES aor eels Bea) 


Two possible interpretations of this flowgraph, 
and of G , follow. 

If the graph in Figure 1 is interpreted as a Markov 
graph, where eae = probability of a transition from 
Staten 1 ato. | state’ 7), then the kth coerficirent, of the 
expansion of G represents the probability of a trans- 


ition from starting state 1 to final state 6 ink steps. 


For example, 1f “a= = Zocands b= ols ca é 27> = SZ 5 chen 


The probability of reaching state 6 in, say, 5 steps is 


found by expanding G to obtain the coefficient of 2° ’ 


which is 0.05273437 . 
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FIGURE 1. FLOWGRAPH 1 


FIGURE 2(a). MACHINE FIGURE 2(b). FLOWGRAPH 2 


If, on the other hand, the graph in Figure 1 is 
aneadjacency “graph in-which "as" ®2 2978*and as = Zz 
indicates that there is a path from node i to node j 
then the k'th coefficient of G is the number of paths 
or length “*k strom starting node "1: “to -tinalr node =o =. 
In this case, 


3 4 5 
+ + 
G= = am 22 = ye Gr oe + bz? Ls de a : 


-22° - 2z - uzt + 1] 


Piere are no=patirs’ of length "Oy 15%or'*2 “=from*node Ff 
commode =16--,*) “of lengths") @=2'0r  lenpti 4s ana so "On. 
As another example, consider the problem of find- 
ing the number of n-digit binary sequences with exactly 
r pairs of adjacent 1's andno adjacent 0's . A finite 
state machine which recognizes sequences having no 
adjacent O's is shown in Figure 2(a). In Figure 2(b), 
the corresponding flowgraph is shown; the branches are 
labelled with bivariate monomials xo 1yS2 » where e, = a 
mnidreates “ther occurrence of a ="0'"°or "1 F*5 orn 0 
indieates no occurrence of a 0" or’ 1°", eo = 1 indicates 


the occurrence of the sequence ll , and en = 0 indicates 


no occurrence of 11 . For this flowgraph, 


2 
Ge: 1 - (y-2)x - Gy=1)x eta} Aas. & (y+2)x? Ps 
LyX ex 
a (2y+y7+2)x°? tee , 


vo soe Ts 


ee tt 
= gt ba. s <@ 56 Molde nt dager yansontbs as 

« t ae i sbon moet digg & et exsdt 20d% gotsokbat 
sitaq Yo vedmuin odt i 9 Yo daetotiteco dr'A ert mode 
. @ bon lentil ot { sbon gatcasze mowt A Adgael to 
,~oeso aids al 


@ 
: we t Pap + “ene & 2 
tL? 


sv “sS = “sS- 


[ sbon moxt S$ 0, ,0 digas to antsq on ers oxaiT : 
0G on bos , # dtgnel to $ =, € digmeal to £, 8 sbon of - 
-~bait to maidew eit tebienoo ,eiqmsxe ‘teritoas 2A ; 

yvitosxs dtiw asoneupese ytsitid tigib-n to isdawn oft gmk 
efinit A . @'0 Sas9ne¢ie on bre e'f Jasos(be to etisg 4 : 


on grivad exsaaupen aesingoos: dsidw anidosm siste : 
,(4)5 evugit nl .(a)S suvght ot aware ef 3'0 tneosthe 
ers eotonsid of? ; wore ei dqetgwolt gaibnoqestieo sft 
is x? erertw , 8919, elsimonem etsitavid dsiw belledsl 
C= ;8 » £ 10 © & to soneriwos0 sft estsotbal 
esteoibni £= 4a, f£ 1 8 6 Yo sonertIS00 on Beteokbal 
sstacibat 0 > .» bas , Li sonsupse aft to sonerunDS 9d? 
pdqwvrgwolt eis 207 » LL Yo sonerwes0 on 


——e i ee. 


The coerricrent of xTyt is the number of sequences of 
lenge tiigin @havingwexactlysor pains of ‘adjacent. 1's)... 
There are, for example, 2 sequences of length 3 having 
exactive im pain Ofwadjacent, “1's -and#no adjacent 10' suet. 

Flowgraphs are used in coding theory to enumerate 
comma-free code words, and in electric circuit theory to 
find the transmission of a circuit. The examples above, 
and others, are given by Lipson [20]. 

The traditional method for solving systems of linear 
equations is Gaussian elimination. In this algorithm, an 
element as, of the coefficient matrix is reduced to zero 
by subtracting from row i , (a.)/apy) xX pivot row k . 
This method cannot be used for integer or polynomial systems 
Since division is not defined in an integral domain. 

To overcome this problem, Rosser [25] has dis- 
cussed an obvious modification of the Gaussian elimination 
method whereby one successively subtracts ay, * row Al 
from as, * row k . The method no longer requires division. 
However it is unsatisfactory because the precision of the 
integers (or the degree of the polynomials) doubles at each 
step. As a result of this exponential growth of coefficients, 
the method requires the equivalent of O(n +2") single- 
precision operations to solve a system of n linear 
equations with single-precision integer coefficients. 


Three alternatives to this method have been proposed. 
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The growth of coefficients can be minimized by 


finding the greatest common divisor of a and aj, at 


kk 
each step. Algorithms which find the g.c.d. of each 
column of the coefficient matrix (and perform row oper- 
ations accordingly) have been investigated by Rosser [25], 
Blankinship [6,7] and Bradley [11,12]. Although the 
growth of coefficients is linear, the computation required 
to find the g.c.d.'s is large, and the complexity of the 
method is o(n®) ° 

A compromise to removing all common factors is 
found in the multi-step algorithms. These algorithms 
are a modification of fraction-free elimination which 
remove at each step a factor which can be shown to occur. 
Although the largest factor is not always removed as in 
the g.c.d. algorithm, the rate of growth of the coeffic- 
ients remains linear, and now no work is involved in 
finding the factor which is known before-hand. Asa 
consequence, multi-step methods are superior to g.c.d. 
methods. Multi-step algorithms are discussed in Chapter 
Two and it is shown that the complexity is O(n?) or 
O(n" log n) , depending on the multi-precision arithmetic 
algorithms used. 

Another alternative exists: the congruential 
algorithms solve the given system modulo a number of 
single-precision primes and construct a solution from the 


modular solutions by means of the Chinese Remainder 
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Theorem. Several solutions have to be found, but they 

can be computed using single-precision operations only. 
Surprisingly, the requisite number of solutions can be 
found, and the solution constructed, in only O(n") 
operations. This fact is shown in Chapter Three, and 

Some improvements to the existing algorithms are suggested. 

There has been some controversy over which of the 
latter two methods, multi-step or congruential, is actually 
Superior. In this study, both methods are investigated 
with the objective of determining which should be the focus 
of future research. In Chapter Two the one and two-step 
algorithms are considered in detail; exact operation counts 
for the integer case are found, and the asymptotic 
behavious of the polynomial case is established. The 
congruential algorithms are similarly treated in Chapter 
Three. 

Finally, in Chapter .Four 10 is shown thatthe con= 
gruential method is asymptotically superior to the multi- 
step methods, both for systems with integer coefficients 
and systems with polynomial coefficients. It is also 
established that for systems of n linear equations with 
single-precision integer coefficients, the number of 
operations required by the congruential methods will 
always be less than those required by the two-step algor- 
ithm if n is greater than about five. Similar results 


in the polynomial case are difficult to establish for 
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small n because the number of operations depends not 
only on n_ but also on the number of variables, the 
degrees of these variables, and the number of terms. 
Assumptions made in order to simplify the relationships 
among these factors, assumptions which typify real- 
world problems, tend to make results unreliable for 
small n . However, generalizing on the results 
obtained for the integer case, the congruential method 
Should be superior to the multi-step methods for similar, 
and perhaps even smaller, values of n . Experimental 
results are necessary in establishing more definitive 
conclusions,and McClellan [23] is currently working 

On such experiments. 

Since congruential algorithms are superior to 
multi-step algorithms for all but small n_, special 
attention is given to these algorithms. One disadvantage 
of congruential algorithms is that certain modular 
solutions may have to be discarded if 'bad' primes occur. 
The primes referred to here are the moduli with respect 
to which the system is solved. In Chapter Three a major 
effort is made to deal with such primes. Also in 
Chapter Three a theorem for effectively terminating the 
Chinese Remainder formula for polynomials is given. The 
results obtained there are perhaps the highlights of 
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CHAPTER TWO 


The HuLti=otep Methods 


Discovery of the one-step fraction-free algorithm is 
attributed to Jordan (1838-1922). More recently, Bodewig 
[8], Fox [15], and Luther and Guseman [22] have discussed 
this method of solving exactly a system of linear equations 
with integer coefficients. The two-step algorithm was 
proposed by Bareiss [1,2,3,4] and has also been discussed 
for systems either with integer or with Bogert co- 
efficients by Lipson [20]. 

In section 2.1 the algorithms are discussed. In 
section 2.2 the exact number of operations required to 
solve an integer system by the one and two-step algorithms 
is calculated, while in section 2.3 the approximate number 
of operations required for systemswith polynomial coef- 
ficients is determined. The order of complexity of the 
multi-step algorithms is considered in section 2.4. 

The multi-precision arithmetic required by these 
algorithms can be performed using either the 'classical' 
algorithms such as those described by Knuth [19, section 
4.3.1], or various 'fast' techniques which have been 
recently developed. In sections 2.2 and 2.3, the analysis 


assumes the 'classical' algorithms are used; the order of 
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complexity which can be achieved using 'fast' algorithms 


is discussed in section 2.4. 


20 +The“Al gorithms 

The one-step algorithm is a fraction-free Gaussian 
elimination algorithm which reduces the rate of growth of 
the coefficients by removing at each step a factor which 
is known to accur. To triangularize the n by (ntl) 


augmented matrix [A,b] = (ass of a system of n 


linear equations Ax = b , the algorithm is as follows: 


(-1) _ ; 
ang =oelee <5 


BORG Keb 6 2igue see llm ie 6 
For i-kt] .Jch2s5 ew eeisre ys 
FOr joktly kez oe se Nl, T 5 
pe 1 eCk= 1) ae mise 1)_(k-1) 


Geyeyeckk val bal ik 
Ue peer a ee Tee 


Seen: 


where it is understood that 


atkvl st fori <ick , 1sjént1 
ae = +) ace _ f 
= 0 for Kti<i<n 5 i<)sk 


The division is exact (i.e. fraction-free) for coeffi- 


cients in any integral domain. (See Lipson [20]). 
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The two-step algorithm performs in one iteration two 
steps of the one-step algorithm. The pivot row k is 


calculated as in the one-step method, while for rows 
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If n is even, an extra pivot row is calculated at 


the end. Noting that oe is calculated once per 


iteration and ae and ae once per row, the two- 


step algorithm, then, is as follows: 
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As noted by Lipson [20] and Bareiss [3,4], back-sub- 
stitution can also be carried out with a fraction-free 


algorithm. In general, the components of the solution 


vector x are not integers (or polynomials). However, 


by Cramer's rule x = y/d where d= determinant of A 


4(n-l) 


adjoint 
2 nn 


and ye=A Since d= (see Lipson 
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algorithm is: 
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Then x = Ne ae! ane - If a reduction to lowest terms 
is desired, a greatest common divisor algorithm must be 
used. 

Multi-step algorithms which perform more than two 
steps in one iteration have been investigated by Bareiss 
and Mazukelli [5]. In general, a k-step algorithm for the 
solution of integer systems uses (4) times as many 7 
multiplications/divisions as the one-step algorithm; the 
number of additions/subtractions does not change signif- 
icantly. In section 2.2, for example, we see that the 
two-step algorithm requires two-thirds as many multipli- 
cations/divisions as the one-step algorithm. As the step 
size grows, however, the amount of computation saved 
decreases while the pivoting algorithm needed to ensure 
that all divisors are non-zero becomes increasingly 


complex. In the following sections, only the one and two- 


step algorithms will be considered. 


2ucw Lmteger Systems 
oo 4 Growth of Coetiacients 


Before the number of operations required to perform 
the one and two-step algorithms can be found, it is 


necessary to determine the size of the integers produced 
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It has been proved (Lipson [20], Bareiss [3,4]) that 


each ia is the determinant of a k'th order sub- 
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2e267meQne-olep Elimination 

In this section it is assumed that 'classical' 
algorithms for performing multi-precision arithmetic are 
used. 

Let C[k33] represent the number of single precision 
operations required to multiply a k-precision integer by 
a@ j-precisionjinteger, and let C. and C,. be similarly 
defined. Let A represent single-precision addition or 
subtraction operations, and let M represent single- 


precision multiplication or division operations. 


Then using the classical algorithms: 
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precision =k , then triangularization by the one-step 
algorithm requires the following operations: 
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precision =k  , then triangularization by the two-step 


algorithm requires the following operations: 
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and (m-1) is the maximum value of a single-precision 


integer. 
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2.3.3 Two-Step Elimination 
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Back-substitution is performed as in the one-step 
algorithm. 

The order of complexity of the two-step algorithm 
is the same as that of the one-step algorithm - ao dae 
However, the coefficient is smaller. In the case of uni- 
variate polynomials, for example, M is approximately 
— aon de and A is approximately — wend: . These 
totals are each one-third smaller than those obtained for 
the one-step algorithm. 

A similar reduction in the operations required can 
be observed for systems which satisfy wn <1 . For such 
systems the two-step algorithm requires: 
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2.4 Fast Arithmetic 

Bareiss [3,4] has suggested that 'fast' algorithms 
for the multiplication of multi-precision numbers can be 
used to reduce the order of complexity of the multi-step 
methods. He concludes that with such techniques, the 
multi-step algorithms use O(wn*) operations in the 
integer case. To the contrary, in this section it is 
shown that fast techniques produce an algorithm which is, 
at best, O(wn log ae 

Several algorithms which multiply two k-precision 
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been discovered. The first of these was an 0(k+°8? 35 
algorithm suggested by A. Karatsuba in 1962. Shortly 
afterwards A.L. Toom produced a computer-circuitry scheme 
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4.3.3, "How Fast Can We Multiply?"). 
As Bareiss points out, Schdénhage has speculated 
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Schonhange, then, is suggesting an optimum of ck log k 
operations to perform k-precision multiplication. 
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numbers can be accomplished in k steps "if we leave the 
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domain of conventional computer programming and allow 
ourselves to build a computer which has an unlimited 
number of components all acting at once". Such an 
algorithm would give O(wn*) multi-step methods. For 
conventional computers, however, the multi-step algorithms 


are of order wn' LOS NM 4 at DESt. 


CHAPTER THREE 


Congruential Methods 


The congruential method of solving a system of 
linear equations with integer coefficients was first 
proposed in 1961 by Takahasi and Ishibashi [27]. Since 
that time Newman [24], Borosh and Fraenkel [10], Howell 
and Gregory [16,17,18], Cabay [13], and Bareiss [3,4] 
have suggested improvements. Some attention ae also 
been given, notably by McClellan [23], to the case of 
polynomial coefficients. 

To solve a system of linear equations Ax = b 
with integer (or polynomial) coefficients, the first 
step in the procedure is to solve the system by Gaussian 
elimination modulo a number of primes Py oPoseessPye + 
The true solution is then constructed from the t modular 
solutions by means of the Chinese Remainder Theorem. 

In general, the components of the solution vector x 
are not integers (or polynomials). By Cramer's rule, 
however, x = y/d , where d= determinant of A _ and 
y = oe LEIS are ‘integer (or polynomial) quantities. 
The method, then, 1s to compute for each k , iJ<k<t , 


the solution (dy sy) of 
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where Ay. = A (mod pp), BL = b (mod P,) and 


O.| 
" 


i > on siemod P;) 


ny 


The unique integer (or polynomial) d and the unique 
vector of integers (or polynomials) y satisfying (**) 
can then be constructed by means of the Chinese Remainder 
Theorem if the system (*) has been solved for a sufficient 
number of primes PK 

The congruential method of solving a system of 
linear equations with integer coefficients is discussed in 
section 3.1 and detailed operation counts for the method 
are given. In section 3.2, systems with polynomial coef- 
ficients are considered and approximate operation counts 
are found. 

The congruential method is often hampered by the 
occurrence of what has come to be known as 'bad' primes. 
Recently it has been shown that these primes are not 'bad' 
in the case of systems with integer coefficients. For 
systems with polynomial coefficients, it is shown that 
one type of prime is not 'bad', that subject toa 
certain natural condition, a second type need not occur, 
and that the third type can be used with little or no 
loss of computation. 
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terminating the Chinese Remainder formula for poly- 


nomials is given. 


ciak iiceeen Systems 
Seal Lhe=Method 


When a system of linear equations Ax = b with 
integer coefficients is solved by the congruential method, 
the system is solved modulo t primes Down 1skst . 
Suppose that the Py have been selected. For each prime 


Py, the quantity (dy 5¥;) satisfying 


O. 
" 


d (mod Py) 


y (mod P,) 


1s determined by solving the system Ax = b_ using 
Gaussian elimination modulo PK with partial pivoting. 


The algorithm for triangularization becomes: 


For Nl. =glsa2vwaes tics 5 
POD eo ie Nt 2i5 se a5) SG 


POLee rt eee cies tit y's 


Cheesy (h-1),_C€h-1),-1_(€h-1) 
a acai (arn ) 


i Pet 


By partial pivoting, we mean that if at any stage the 


diagonal pivot element A ie becomes zero, an attempt 


is made to rearrange the remainding rows to obtain a non- 
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zero pivot. The determinant of A (mod Py.) is immed- 
iately available as the product of the pivot elements. 
To determine vp » back-substitution modulo Py, is 
first used to produce a vector x satisfying 

AX, = b (mod pp) . Then X also satisfies 


X =a Cod P}) . But since y = dx , it follows that 


Ye a kk (mod Py) e 


The conditions which should be imposed on the primes 


P, now become apparent. Each Py should be: 

(1) less than or equal to the word-size of the 
computer so that all modular operations are 
performed on single-precision integers. 

(2) as large as possible (within the restriction 
imposed by (1)) so that a minimum number of 
moduli can be used. 

(3) prime so that esos (mod Py) is defined 

(h-1) 
whenever ary AS Os 
There are two algorithms available for finding ae 
(mod p) . By Fermat's Theorem, Ay ea ae (mod p) . 


However, Collins [14] has shown that it is about twice as 


efficient to use the extended Euclidean algorithm, which 


' ' 1 1 
finds integers a and p such that aa + pp = (a,p) 
' t 


Since a and p are relatively prime, aa + pp =1 . 


1 
Then A, = 1 (mod pJ) ands a is the required inverse. 
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The algorithm (see Knuth [19], page 302) is as follows: 


(u,.U,) + (O,p) 5 
Wap) ee ly adar 


While Vo 


G+ Luy/voJ 


#0 do 


(t,,t,) + Cu, »U,) ~ (Vv) »VIq 
(uy so) + (vy vo) 


(Vj 2V5) + (ty»t,) 3 


By Lamé's Theorem, this algorithm requires a maximum 
number of divisions when a and p are consecutive 
Fibonacci numbers. In this case (see Knuth [19], page 
320), the number of divisions < [4.785 + log, a + 1.672] - 
2y8< 45 10g, oP . The average number of divisions required, 
however, is 5 log,P (Collins [14))" and it asethis 
latter estimate which will be used. 

It may happen that during the h'th step of the 
triangularization process a zero pivot element is encount- 


ered. (That is, no non-zero pivot can be obtained by 


rearranging the rows of the matrix.) In this case 


Gaps ie (mod Pj) no longer exists; however, it is now 
known that d, = 0 . Then either d= 0 (in which case 


the coefficient matrix A is singular and no solution 


exists), or d is amultiple of p, . In the latter 
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case Vx can no longer be found from the equation 
= Fy eee 

Several approaches have been taken to this 'bad' 
prime problem. The probability that Py is a factor 
of d is small. Therefore Takahasi and Ishibashi [27] 
and Newman [24] suggest terminating the algorithm on the 
assumption that A is singular. Borosh and Fraenkel 
[10], on the other hand, discard this prime and use 
another if any previous or subsequent moduli show that d 
neynotman fact zero. 

However, Cabay [13] has shown that Vx can be found 
even when dy. is zero. The column with the zero pivot 
element (column j, say) is ignored and the rest of the 
columns are eliminated as if column j were not present. 
Two cases arise. If rank [A,b]J = n (mod Py) it can 


be shown that 


n-1_ (0) (3-2) (3-1) (n-2) 


Cy) = (-1) ale eee j-1,j-1 abreieelbe ee 


; »6n-2) 
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n-l,snn > 


that CY, = OUP ror ky tl ice ess andathar Cy); : 
P16 )-1 5, Can be found@ py back-substitution in an-2), = 
0 (mod p,) . If rank [A,b] <n (mod p,) , then a 
second zero pivot element is encountered and Vy must be 
the zero vector. A slight modification of the Gaussian 


elimination algorithm is therefore sufficient to handle 


a 'bad' prime, should one occur. 
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From dy. and Yi » l<k<t  , a solution 
y = Vy (mod Py ee dy (mod P,) is constructed by 
means of the Chinese Remainder Theorem for integers. 
There are two constructive proofs of this theorem: the 
Newtonian formula and the Lagrangian formula. Lipson 
[21] has shown that the Newtonian formula is to be 
preferred in as much as the number of moduli required 
need not be known in advance, the storange requirements 
are smaller, and fewer multiplications/divisions need be 
performed. 

The Newtonian formula for constructing an integer I 


satisfying I = Uy (mod P}) fon ivk<u Is "plveu Dy. 


I = a, + a,P, + 43P)Py t++-+ A,P]Po+++Py_y Cort 


where ay Sy = Uy 


- eee). =f, 
de Cuy = Sy) (Py Po ++ *Py_y? (mod P,?) 
Se ete oe eo Pie ee 


: -l_- -1l 
It is assumed that cy. F pieiPeeSp ie (mod P}) sia. SK Stir, 


is pre-computed, again by means of the extended Euclidean 


algorithm. 


Lipson [21] also analysed three implementations of 
the Newtonian formula. The best of these (from the view- 
point of operations and storage required) proceeds as 


follows: 
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For 1 = k=-2" step -1 to "1 #do 
(mod Py) ; 
Vie Cuy-v) ich, (mod Py) ‘ 


Then I is computed by Horner's rule: 
I = CC. +6 Car Py) tap _y) Peo tay_o) +e tay) Py + a, - (3.1.2) 


Observe that single-precision operations only are required 
to compute the coefficients a 
The integer I so constructed is only one out of 


infinitely many integers satisfying the congruences 
ge = 
ie) I =u, (mod p,) Ilsk<t . 


We know however, by the Chinese Remainder Theorem, that 
there exists a unique I satisfying (*) if it is known 
that c <I <c + piPo---Py where c is an arbitrary 
integer. Since d and y may be negative, we wish to 


represent the residues symmetrically about zero. There- 
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we note that if A = (a4) is an n by n matrix and 

b = (b;) is an n by 1 vector, then y and d are 
n'th order determinants. If ais and b; have precision 
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Then 2y and 2d have precision < nw + log 2 and 

t = nw + log 2 + 1 primes are sufficient to satisfy (**). 
An upper bound on the number of modular solutions 

required is given by t . In some cases t_ solutions 

will actually be needed, but in other cases the bound 

is unnecessarily large. An alternative approach, used by 

Borosh and Fraenkel [10], is to continue computing terms 


of the mixed-radix representation of y and d 


Vwaeritned2Pi antishaP ois an” aYkPabretekesitages 
d= d, + doPy + d3P1P5 Feet. d)PyPo+* Pp iy +iae 
until Tondsomes 4}! ds = 0 and Ys is the zero vector. 


The probability that the correct solution has been found 
is high, but a substitution check is made, and additional 
terms of the mixed-radix representation are found if the 
check fails. 

The substitution check is expensive since it requires 
operations with multi-precision integers. An alternative 
termination procedure has been suggested by Cabay [13]. 
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then Ay, = d;b and x = Yz/ay is the solution if 
dy FeO lt dy Cet | ae Yr #0 , then A is singular. 
ot ds = 0 and Yr is the zero vector, it is almost 
certain that A is singular, but to be sure, more 


primes must be used until the bound (**) is satisfied. 


Syed ae Operation Counts 


The assumption is made that if d is a single- 
precision integer and e is an s-precision integer, 
calculation of (dte) requires s additions, (dxe) 
requires s multiplications and (s-1) additions, and 
(ed) requires (2s-2) multiplications/divisions and 
(s-1) additions/subtractions. 

The formation of t modular systems, then, requires 


the following operations: 
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The Gaussian elimination algorithm requires the 
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and that division by Py is performed to keep all 


integers single-precision, then a single solutions 
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During back-substitution, multiplication by the inverse 


of ann? l<h<n , is performed. Since all of these 
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triangularization process, it follows that: 
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To find the mixed-radix representation (Equation 


(3.1.1)) of an integer from t modular values requires: 
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Application of Horner's rule (Equation (3.1.2)) requires: 
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where s is the precision of the elements of A and b , 
and w and m are defined by (3.1.3). 
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a system of linear equations Ax = b where the elements 


of A and b are multi-variate polynomials with 
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integer coefficients is considered. As in the integer 
Gases piCesond ey eesatistying. Ay. = db Jare found, and 
then y/d is the required solution. The components 


of d and y are polynomials of the form 


where e = C@3 Eno +++ 9€)) and a, & Les 


To ensure that none but the initial and final steps — 
require multi-precision operations, the system is 
solved, as in the integer case, modulo a number of 
primes P, + For each prime P, the solution is repre- 
sented by dy and Vy with coefficients which are con- 


gruent modulo to the “coefficients of ‘“d!“and* “y 


Py 
respectively. The final step is construction of the 
multi-precision coefficients of y and d by means of 
the Chinese Remainder Theorem for integers. Since the 
latter process has been described in section 3.1, the 
problem to be discussed here is that of finding the 
solution’ "dad (mod*=p)S"y*(mod*p) Y°or+a system VAy “= "db 
(mod p) . 

To solve the polynomial system Ay = db (mod p) we 
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appropriate number of points c = (0) 2Ca a+ 9C,) eke 
each point ec the system of polynomial equations then 
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solved by methods already described. The solution of 
the integer system represents the polynomial solution 
of the original system evaluated at the point ec. 
The polynomial solution can be constructed uniquely by 
interpolation as long as integer solutions have been 
found for a sufficient number of evaluation points. (See 
Figure 3.) 

In the congruential setting, evaluation of a poly- 
nomial F(X Xo 06+ 9X) (mod p) at the point c cor- 


responds to finding the residue of E(x) »Xo 000+ 9%) on 
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need not be linear; in practice it is convenient to 
choose linear moduli of the form (x.-c;) so that the 
residues can be found by evaluation rather than by 
division... 

In the remainder of this section, some of the 
difficulties which have been ercoune ere with the con- 
gruential method are considered. 
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when the determinant d of Apuatethe.point: arls VS 
multiple of p , or when d _ contains the factor 
(x;-c,) EOD SQMG, wisssngliSti as, 200 when d is actually 
ZEYOmegli ad vis not zero, McClellan [23] suggests 
discarding the prime p or the evaluation point, depend- 
ing on which is responsible for the zero determinant. 
This discarding process is costly, for several modular 
solutions previously computed may have to be discarded 
when p or (x -c;) is found to be 'bad'. This waste 
is needless in as much as a zero value for dy. is 
perfectly legitimate and Y(Cy Co eee oC) can still be 
found by Cabay's [13] method. 

A second type of 'bad' prime can occur when con- 
structing a polynomial ECX) Xo a0 0 9X) (mod, p) « from its 
values at an appropriate number of points ec. For 
polynomials over the integers, the interpolation problem 
is well-posed as long as f is known at a sufficient 
number of points. For polynomials over the integers 
(mod p), the following additional condition is imposed 


on the i'th component of (Cy Co oe ee oC) for vaults 72 
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That is, from integers each representing the solution 
(mod p) at points (Oy sCos+ee 9c) » polynomials in one 


variable x, each representing the solution (mod p) 


evaluated at CX) Xo oee2 Xa) = (Cy 2Cooe ee oC _y) 
are constructed; then polynomials in two variables Xr 
and Xn each representing the solution (mod p) 
evaluated at OR) 9X aoe ee Xo) = (Cy Coase 09a) are 


found; and so On, Urtal taisolution any m variables 1s 
obtained. This process of constructing a polynomial 
(mod p) from its modular values is illustrated in 
Figure 3. Therefore it is sufficient to show that a 
polynomial E(x) Xo 0+++ X) (mod p) of degree d in 
the variable x, can be constructed uniquely from 
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is the true solution), it is clear that the two d's 
obtained are not modular representations of the same 
polynomial, and interpolation using these two partial 
solutions will not yield the correct solution. This 
case will be referred to as 'bad' primes of type three. 
In an algorithm, the fact that 3 is a 'bad' prime would 
be recognized by observing that three non-zero terms of 
the mixed-radix representation (mod 5) were found. 
Therefore three terms of the modulo 3 solution are 
required to ensure that modular representations of the 
Same polynomial solution are obtained in each case. 


Since only two terms were computed (mod 3), the modulo 3 


partial solution may be incorrect. 


Interpolation (and the termination algorithm) are 
applied recursively, starting with the integer solutions 
obtained and working upwards in the tree to obtain first 
solutions in one variable, then solutions in two 
variables, and so on. Therefore 'bad' primes may occur 
at any level in the tree. That is, a 'bad' prime may 


be an integer modulus Py (as in Figure 4) or a poly- 


nomial modulus (x,-04%?) 


° 


To ensure that early termination does not cause an 
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been noted that the terms of the mixed-radix repre- 
sentation of d, and y, which have been obtained are 
not incorrect, but that more of them are required. 
Instead of discarding the modulus, the additional terms 
can be computed. In Figure 4, for example, the additional 
term required modulo 3 (shown in dotted lines) can be | 
computed to give the correct result. 

Two observations can be made. First, since s zero 
coefficients of the mixed-radix representations of ce 


and Yr are computed before terminating modulo 


of) 


(X5- eicGor, amod P_ ), more terms are needed only if 


the degree of some other solution d modulo 


(3) 


Teer 
(x.-c ) (or mod Ps ) ais greater by at least (stl) . 
Second, if the 'bad' prime occurs after any 'good' prime, 
computation of the additional terms is not troublesome 
because it is known at the time of computation that extra 
terms are required. 

In the event that a ‘bad* prime occurs before a 
'sood' prime, additional terms of the mixed-radix repre- 
sentation must be added to a previous solution. To do so, 
the previous solution must be known. In general, sol- 
utions obtained lower in the tree are released from 
storage as solutions in more variables are obtained. 
However, it is always possible to regain solutions lower 


in the tree by evaluating a solution higher in the tree. 


For example, if a polynomial partial solution d(x, Xo) 
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is known, a previous solution d(x, ) such that 

d(x, + x5) = d(x) [mod (x)-¢)] can be obtained by 
evaluating d(x, 9X5) at x =c . The computation 
required to do so is certainly less than that required 
to solve again, or to solve with another prime. 

In some cases, no previous solutions have to be 
found before an additional term can be added. In 
Figure 4, for example, d; =i (moda 3) =; ya > C220) 
(mod 3) are immediately available when the modulo 5 
solution indicates that an additional term is required. 
The additional term can be computed without obtaining 
againgtnes solutions sats xp = 0s. xX == 1.5 (mod sane. 

'Bad' primes of type three, then, can always be 
used, and can often be used with no loss of computation. 
Additional computation is sometimes required, but it is 
certainly less than that required when the prime is 


discarded. 


3.2.2 Operation Counts 

As in Chapter Two, the analysis is restricted to 
the case of polynomials of degree d in each of r 
variables and fecane single-precision coefficients. It 
is therefore sufficient to find solutions at (nd+1)™ 
points modulo t primes p, where t is defined by 
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Construction of the multi-precision coefficients of 
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#additions = (nd+1)? (nt1) (3 t? - 5 t) 
Dis ae 3 3 2 3 
= + pad ee = oe 
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Therefore the complete solution requires the following 


arithmetic operations: 
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CHAPTER FOUR 


Conclusions 


In this chapter the multi-step and congruential 
methods are compared on the basis of the results 
established in Chapters Two and Three. The order of 
complexity of the two methods are noted, and some 
observations are made about when one method can be 
expected to require fewer arithmetic operations than the 
other. 

From Chapter Two, the complexity of the one-step, 

SZ 


two-step, (and in general multi-step): methods is O(n'w ) 


for systems of n equations with integer coefficients 


r 
Il dw’) for systems with polynomial 
i=l 


coefficients of degree d; in the variables Xs > aS 


and o(n?t tr, 


where w is defined by (2.2.1) and (2.3.1) respectively. 
Even with fast multiplication, these orders are at least 


4 r+yu, © 
Wiwetog nn and )n ea d.)w log n . For the con- 


i=l 
gruential method, it was established in Chapter Three 
r 
+ 
that these orders are n'w and n™ a II d.)w respec- 
| i=l 


tively. The congruential method is therefore certainly 
superior for large n . It is stressed that this con- 
clusion contradicts that of Bareiss [3,4], who states 

that with optimal fast arithmetic the multi-step methods 


have the same order of complexity as the congruential 
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methods, but with a smaller leading coefficient. 

We consider next the values of n for which the 
congruential method for integers uses fewer operations. 
Assume that the linear equations have single-precision 


coefficients such that 


i 
log n + log (ctl) seu (4.1) 


where (m-1) is the largest single-precision integer and 
G@uyisydefined as in (3.1.3). (That is. for small <n ~, 

the coefficients are slightly less than the maximum single- 
precision value.) Then for the two-step method with 


classical multiplication, the operations required are: 


= net £ n’ + + n? + = n? + ca Naty 23M 
+ = a ¢ +33 i a5 ra = ee 2 n= "2dA 
For the congruential method, the total is: 
Rae + ne ene Co + - log,m) + ne + ‘ log,m) - 2 JM 
oF diy nee = nee n? (53 + = log,m) + n(& meaner wire 


The number of operations required by the congruential 
method depends on the word-size of the computer being used. 


For a 32-bit word (where one bit is a sign bit) 


log,m = log, (2°*) sero 1 
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Then for the congruential method the arithmetic oper- 


ations required are: 


3° n AP STi n + aes Ni = 2 1M 
1 4 chs Rome: See 225 
+ =a see, pediinaate a 
[5 n+ pak * rey ee 1 nel Asad. 


In this case, the congruential method uses fewer multi- 
plications/divisions for n> 16 , fewer additions/ 
subtractions for n> 2 _ , and fewer operations in total 
LOY tone. 

In the polynomial case, a reliable exact comparison 
cannot be made on the basis of the approximate operation 
counts obtained. However, since the congruential method 


is asymtotically better than the multi-step methods by 
r 

ENR 
i=l 


for smaller n than in the integer case if rand d; 


a Lactonr, Of en sow » the method may be superior 


are large. Experimental evidence is needed here, 
(including experiments with algorithms using fast multi- 
plication techniques) to establish the cross-over point 
at which the congruential method is superior. McClellan 
(23] is currently working on such experiments. 

It should be noted that as the word size of the 
computer decreases, the performance of the congruential 
method in relation to the multi-step methods improves. 
With a word-size of 16 bits, for example, the congruential 


method uses fewer total operations for n> 4 (for the 
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integer case when equation (4.1) is satisfied). 

In conclusion, two observations not central to the 
theme of this study, but never-the-less of interest, can 
be made. 

First, the congruential algorithm described in 
Chapter Three is not optimal in a theoretical sense, 
since an Cun method for solving systems of linear 
equations exists. Strassen [26] has devised fast matrix 
multiplication and inversion algorithms which are 
os} » In the congruential setting, this yields an 
O(n? By) method for integers. However, the coefficient 


preceding noe ee is very large (approximately 12). 


Since 12 te < n? only when n > Wa » the method, 
while theoretically of interest, is not practical. 
Second, if multiplication and division are more 
costly operations than addition and subtraction, as they 
are for many computers, a method proposed by Winograd 
[28] is of interest. He uses block Gaussian elimination 
in conjunction with matrix multiplication and inversion 
algorithms which exchange about half of the multiplication/ 
division operations for additions/subtractions. The 
total operation count is approximately the same as tra- 
ditional Gaussian elimination, but fewer of these are 
multiplications/divisions. This method has the dis- 


advantage (as does Strassen's) that certain submatrices of 


the coefficient matrix must be non-singular. 
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